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Abstract 

Exascale computing promises quantities of data too large to efficiently store and 
transfer across networks in order to be able to analyze and visualize the results. 

We investigate Compressive Sensing (CS) as a way to reduce the size of the data 
as it is being stored. CS works by sampling the data on the computational cluster 
within an alternative function space such as wavelet bases, and then reconstructing 
back to the original space on visualization platforms. While much work has gone 
into exploring CS on structured data sets, such as image data, we investigate its 
usefulness for point clouds such as unstructured mesh datasets found in many fi¬ 
nite element simulations. We sample using second generation wavelets (SGW) and 
reconstruct using the Stagewise Orthogonal Matching Pursuit (StOMP) algorithm. 

We analyze the compression ratios achievable and quality of reconstructed results 
at each compression rate. We are able to achieve compression ratios between 10 
and 30 on moderate size datasets with minimal visual deterioration as a result of 
the lossy compression. 
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1 Introduction 


Large-scale computing platforms are challenged by the growing size of computational 
datasets generated by simulations. On future exascale platforms, the datasets are ex¬ 
pected to be too large to be efficiently stored and transferred across networks to analysis 
and visualization workstations, thus impeding data exploration and knowledge discovery. 
Several in-situ data compression schemes are available to rednce the size of simulation 
datasets. The compressed version of the dataset is transferred to any workstation and 
reconstructed by a scientist for analysis and visualization purposes. 

A suitable compression scheme is one that has a small impact on the running simula¬ 
tion. In other words, the code should not be significantly altered and the overhead cost 
should be very low. We propose compressive sensing (CS) as such method to compress 
and reconstruct datasets. Starting from the hypothesis that scientific data has low infor¬ 
mation density, CS is known to be fast and will provide a high spatial compression ratio. 
CS is data agnostic i.e. it does not require any knowledge of the simulation data type and 
the features contained therein, as is the case in wavelet compression. Therefore, it does 
not require the selection of a basis type and order during the compression in-situ. The 
wavelets bases are selected and computed during the post-processing stage allowing in¬ 
teractive reconstruction and visualization according to the required accuracy and quality. 
Finally, CS is non-intrusive which means that its implementation does not significantly 
alter the simulation code. 

Conventional CS theory, developed for image compression, is based on the represen¬ 
tation of lattice data using first generation wavelets. There is currently no literature on 
the applicability of compressed sensing on point-cloud data. We will extend the theory to 
encompass second generation wavelets (SGW) that can be described on point-clouds e.g. 
an unstructured mesh. To our knowledge, this extension of CS to point-cloud data has 
not been explored yet. It will require designing random matrices that are incoherent with 
SGW and establishing the restricted isometry required for unique inflation of compressed 
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samples. 


1.1 Literature Review 

In-situ reduction of large computational datasets has recently been the subject of multiple 
research efforts. In the ISABELA project [1], the dataset is sorted as a vector and encoded 
with B-splines while computing the resnlting associated errors. DIRAQ is a more recent 
effort by the same research group [2]. It is a faster parallel in-situ data indexing machinery 
that enables an efficient query of the original data. ISABELA and DIRAQ suffer from 
low compression ratios (~ 4 — 6x) which might not be sufficient for exascale datasets 
where I/O will be a more critical bottleneck. The zfp project, [3], works by compressing 
3D blocks of floating-point field data into a variable-length bitstream. The zfp approach 
resnlts in compression rates ranging between one and two orders of magnitude but it is 
limited to regular grids. However, our CS approach compresses unstructured data and we 
expect to obtain compression ratios ranging between one and three orders of magnitude. 

In-situ visualization [4] and feature extraction [5] are also ongoing research efforts that 
use combustion simulation datasets from the S3D simulator developed at Sandia National 
Labs (SNL). Selected simulation outputs are analyzed and visualized in-situ. Data anal¬ 
ysis occurs at separate computational nodes incurring a significant overhead. Moreover, 
such in-situ techniques require pre-selected outputs and cannot be used interactively since 
most clusters are operated in batch mode. Similar techniques have been implemented in 
the Paraview coprocessing library [6]. 

Sampling techniques are also employed in-situ for data reduction. Woodring et al. [7] 
devised such an approach for a large-scale particle cosmological simulation. The major 
feature in this work is the low cost of the offline data reconstruction and visualization. 
However, the compression ratios are low and require skipping levels in the simulation data. 
Sublinear sampling algorithms are also proposed for data reduction [8]. Their success has 
been proven in the analysis of stationary graphs. They are an ongoing effort at SNL to 
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transfer sublinear sampling theory into practice with focus on large combustion datasets. 


2 Mathematical Background 

2.1 Compressive Sensing 

Compressive Sensing (CS) [9] was first proposed for image compression based on the 
premise that most fields on lattices (images) can be sparsely represented using first gener¬ 
ation wavelets viz., a A^-pixel image can be well-approximated with K judiciously chosen 
wavelets, K N. The non-zero wavelets i.e., the sparsity pattern, are unknown a priori. 
Compression can be cast mathematically as the sparse sampling of the dataset in trans¬ 
form domains. Compressive samples are obtained by projecting the image on random 
vectors [10], 

y = $/, (1) 

where, y G are the compressed samples, / G is the field we are sampling, and 
$ G is the sampling matrix. Theoretically, only C ■ K ■ \og 2 {N/K) samples are 

necessary for reconstruction, where C is a constant that depends on the data [9]. 

In practice, reconstruction is posed as a linear inverse problem for the wavelet coeffi¬ 
cients s G conditioned on compressive samples and defined as, 

/ = ( 2 ) 

with T being the wavelet basis matrix. Then the inverse problem is stated as, 

y = $\l/s = As, (3) 

and we must find s from samples y obtained by compression. Regularization is provided 
by a Li norm of the wavelet coefficients, which also enforces sparsity. The scalability of 
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the inverse problem solvers (called shrinkage regression) has only been recently addressed 
[11], The random vectors in $ are designed to be incoherent with the chosen wavelets. 
Incoherence ensures that the amount of wavelets information carried by the compressive 
samples is high. Different types of sampling matrices $ can be used to perform the 
compression [10] (see Eq. 1). In our work, we use the Bernoulli matrix due to its superior 
incoherence properties with most basis sets T. 

2.2 Second Generation Wavelets 

Wavelets can represent a basis set that can be linearly combined to represent multiresolu¬ 
tion functions. Wavelets have compact support and encode all the scales in the data (j) 
as well as their location {k) in space. There are two types of wavelets: First generation 
wavelets (FGW) and second generation wavelets (SGW), [12,13]. To date, all GS work of 
which we are aware is based on first generation wavelets which only accommodate data 
defined on regular grids. FGWs are characterized by scaling functions (f){x) to perform 
approximations and 'ip{x) to find the details in a function / [14]. These are defined at all 
levels j in the hierarchy and span all locations k in the regular grid. They are computed in 
terms of the so-called mother wavelet 00 which provides the approximation at the largest 
level j = 0. At each level j, the scaling and detail functions are defined as. 


- k), (4) 

kez 

= XI - k), (5) 

where and bk are constrained to maintain orthogonality. Each new basis function 
models a finer resolution detail in the space being spanned. Regular grids are dyadic and 
the maximum number of levels jy^ax is equal to \og 2 {N) where N is the number of grid 
points in each dimension. 

In this work, we examine compression on unstructured data or point clouds which are 
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not well represented by FGWs [12], For these nnstructnred data we use the SGWs of 
Alpert et ah, [15,16] referred to as multi-wavelets. There are two major differences be¬ 
tween FGWs and SGWs. The first difference is that the maximum number of levels jmax, 
is computed by recursively splitting the non-dyadic mesh into different non-overlapping 
groups, thereby forming a multiscale hierarchy [12]. Since SGWs do not require dyadic 
grids, they can accommodate finite intervals and irregular geometries. The second dif¬ 
ference is that in place of one scaling function, there are several, • • • , defined 

over groups of the space. In addition, the functions themselves are defined by the dis¬ 
crete set of points, as opposed to a continuous representation. The discrete Alpert 
wavelets are polynomials, are quick to compute and are represented in the matrix, 

T. Details on how to compute the matrix T are given in [15]. 

2.3 Reconstruction 

Much of the work in CS is handled by the reconstruction phase which uses wavelet basis 
to reconstruct the data set from the compressed samples. Here we use a greedy algo¬ 
rithm, Stagewise Orthogonal Matching Pursuit (StOMP) [17], that has been empirically 
demonstrated to be very efficient. 

The reconstruction process can be described as follows. We have an underdetermined 
linear system, 

y = (6) 

given y and the matrix product A = d>T, where d> is our sampling matrix and T is the 
wavelet basis matrix as discussed in Section 2.1. We need to find s. If d> and T exhibit 
low mutual coherence and s is sparse in T i.e. has few non-zero elements, then StOMP 
can efficiently provide a solution. StOMP finds the nonzero elements of s through a series 
of increasingly correct estimates. Starting with an initial estimate Sq = 0 and an initial 
residual Tq = y, the algorithm maintains estimates of the location of the nonzero elements 
of s. 
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StOMP finds residual correlations, 



( 7 ) 


and uses these to find a new set of nonzero entries 


Jn {j • |^n(j)| ^ • 


( 8 ) 


where cr„ assumes a Gaussian noise on each entry and tn is a threshold parameter we 
provide in order to assess which coefficients have to be retained. Elements above the 


threshold are considered nonzero entries and added to the set J„. The new set of entries 


Jn are added to the current estimate of nonzero entries = /„_i U J„, and used to give 
a new approximation of and residual r„, 



(9) 


and 


( 10 ) 


Tn^y- As. 


3 Implementation 

In order to test the procedure, we have built a processing pipeline that both samples and 
reconstructs data from a dataset. It allows us to experiment with existing data sets and 
assess reconstruction quality. In the final implementation, the library will be split into 
two pieces. The in-situ piece consists of a small sampling codebase. It stores the samples, 
mesh points and connectivity, and the seed used to generate the Bernoulli sampling matrix 
$. During the in-situ processing the sampling matrix is not constructed explicity, but 
used implicitly to generate the sampled data. No wavelet computation is required at this 


stage. 
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The reconstruction side is responsible for rebuilding the sampling matrix and con¬ 
structing the wavelet matrix from the mesh data. By providing those matrices to StOMP, 
we are able to reconstruct the wavelet sampled data, s in Figure 1, and then inverse trans¬ 
form s to reconstruct the original data. 


Computational Cluster 



• Compressed samples y 

• Random Matrix seed 

• Mesh 



Paraview 

filter 


Interactive 
analysis and 
visualization 


I. Spatial compression (online): 

do y=0.X (matrix-free) 

II. Reconstruction (offline, Paraview): 

solve y=CZ).4^.s and do x=4^'.s 


Figure 1: A schematic showing the major steps during the in-situ data compression using 
CS (I) and the offline reconstrnction and visnalization of the original dataset (II) nsing 
Paraview. The random seed, the mesh and the compressed samples y are transferred to 
the visualization platform where the reconstruction of x takes place. 


We have implemented the reconstruction procedure in two ways: 1) in Matlab in order 
to experiment and prodnce some of the resnlts shown below, and 2) a more prodnction- 
oriented version written using the Trilinos sparse linear algebra library [18] and ParaView 
for visualization [19]. 


4 Results 


In this section we discuss the compression capability and reconstruction quality of our 
method. We also describe some aspects of its practical implementation. We consider two 
types of data defined on nnstrnctured meshes. The first type are “toy problems” where 










we assume mathematical functions defined on an irregular two-dimensional geometry. 
These datasets are small, and we can compress and reconstruct them on one processor. 
The second type are larger datasets obtained from simulations. In this case, we consider 
unstructured three-dimensional meshes distributed among many processors. 

4.1 Two-dimensional datasets 

We consider a square geometry with randomly chosen holes such that it constitutes an ir¬ 
regular geometry. We discretize it using a triangular mesh consisting of = 33,067 nodes. 
We assume two functions / and g given in Eqs. 11 and 12 as the datasets represented on 
the obtained mesh. We choose these functions such that / exhibits multiple oscillations 
and reveals more features than g. The motivations behind this choice is that we would 
like to explore the effect of data features on the number of samples y required during 
compressed sensing, that are necessary to accurately reconstruct the original dataset. 


/ = 48sin(87ra:)sin(77ry)sin(67rx) (11) 

g = 12sin(27ra:) [4sin(27rx) — 4sin(27ry)] (12) 

We compress / e and g G using a random Bernoulli matrix $ G as 

described in Section 2.1. We select an Alpert wavelet basis and reconstruct / using the 

StOMP algorithm described in Section 2.3. Both compression and reconstruction are 

performed in a serial run. We denote the reconstructed fields by and respectively. 
They are plotted in Figure 2 which shows that the original and reconstructed datasets are 
visually identical. The function / that reveals many features; a minimum compression 
ratio of i? = 10 was required for its accurate reconstruction. However, g affords a higher 
compression ratio R = 30 since it has fewer features which incur more data redundancy. 
This result indicates the possibility to predict the compression ratio in terms of the global 
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gradient. This latter increases with the number of features in a given dataset. Such 
prediction of the compression ratio is extremely important in-situ since the compressibility 
of the 





Figure 2: Plots showing (left column) smooth datasets represented on a complex geometry 
and an unstructured mesh of 33,067 nodes, and (right column) their reconstructed version 
from compressed samples, at full wavelet detail. The top and bottom rows denote two 
different datasets given by Eqs. 11 and 12, respectively. The compression ratios are i? = 10 
and R = 30 for / and g, respectively. The bases for reconstruction are Alpert wavelets 
with an order w = 5. 

In order to obtain a quantitative description of the reconstruction accuracy, we eval¬ 
uate the global L 2 norm of difference between the original and reconstructed versions. 
Figure 3 shows this error for / and p as a function of the wavelet order for different 
compression ratios. As expected, the error is lower for low compression ratios. We notice 
that there exists a minimum compression ratio required to obtain a low reconstruction 
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error i.e. a correct reconstruction. For example, the compression ratio should be smaller 
than 30 for the dataset g. This is consistent with the Donoho chart [17] which depicts a 
discontinuity in the compression ratio range that guarantees an accurate reconstruction. 



Figure 3: Plots showing the normalized L 2 error between the original and reconstructed 
datasets /'' (left) and g'^ (right) plotted in Figure 2 as a function of the wavelet order for 
different compression ratios, as indicated. 


The error decreases with the wavelets order w. The decreasing trend is reversed at 
larger values of w. This is mainly attributed to the over-fitting of the function / by the 
high order Alpert polynomial wavelets. It is therefore preferred to choose lower orders. 
According to the plots in Figure 3, w = 5 is an optimum value for the wavelet order that 
minimizes the reconstruction error and prevents over-fitting of the given function in the 
dataset. 



Figure 4: Plots showing (left) the normalized L 2 error and (right) the time taken in a 
serial run to perform the reconstruction of the dataset, as a function of the wavelet detail 
level j ior w = 5. These results are reported using with and without the CLOD approach, 
as indicated. 


The results presented so far are reported at the full wavelets detail level. We turn our 
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attention now to the effect of the detail level on the reconstructed function quality. Such 
analysis is useful for datasets that exhibit multiple features. By construction, wavelets 
are able to represent all scales in a function where a scale j reveals a level of detail as 
described in Section 2.2. In this work, we find the number of detail levels by performing 
orthogonal splits along each axis [13] which results in jmax = 11 for w = 5 for the mesh 
representing /. The wavelet matrix T is computed as the product of different detail level 
sparse matrices Tj [15,16] following: 

J—Jmax 

>!>= n ( 13 ) 

i=i 

T can be computed at any 1 < j < jmax and used to perform a StOMP reconstruc¬ 
tion. The sparsity of T decreases with j to encode more details. Figure 5 shows the 
reconstructed function /*' at different detail levels j. By scanning the figure, we notice 
how the fine details in the function are revealed. Changes between functions reconstructed 
at two consecutive detail levels decrease with j, indicating convergence. At j = 6, the 
function is visually identical to the original / in Figure 2. 

For each detail level, we consider two reconstruction approaches. In the first approach, 
the initial guess of the StOMP reconstruction is the same and assumes wavelet modes 
equal zero. In the second approach, an initial guess at a detail level j is obtained from 
the solution at level j — 1. Thus, we call this approach a continuous level of detail 
(CLOD) reconstruction. Figure 4 (left) shows the reconstruction error of the dataset / as 
a function of the wavelet detail level. For both approaches, the error decreases with the 
detail level, as expected. However, the CLOD approach results in an error about three 
times lower at the finest details. This is due to the accumulation of knowledge in the 
reconstructed /’’ with consecutive detail levels. Figure 4 (right) shows the reconstruction 
time required at each level. Using CLOD, the relative time taken between two consecutive 
levels decreases with j due to updated initial guess resulting in a decreased number of 
StOMP iterations. However, the cumulative time is substantially higher than the case 
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Figure 5: Plots showing the dataset in Figure 2 reconstructed at different wavelet detail 
levels j, as indicated. Results are generated from samples compressed with i? = 10 that 
are reconstrncted with an Alpert wavelet basis of order w — 5. 


without CLOD. The additional time required by CLOD to reach a smaller reconstruction 
error can be alleviated by skipping the reconstruction at some levels e.g. performing the 
CLOD reconstruction at the odd numbers levels. 


4.2 Three-dimensional distributed datasets 

In this section, we consider a larger dataset represented on a three-dimensional tetrahe¬ 
dral mesh with 396,264 points. The data is the temperature field obtained by a transient 
heat condnction simulation. The cylindrical geometry constitutes several sub-domains of 
different solid materials. The sub-domains sizes, heat conductivity and heat generations 
rates are chosen in a random fashion which results in a heterogeneous temperature distri¬ 
bution. Initially, the temperatnre is nniform across the domain, after which, it evolves to 
a steady state. The simulation is performed in parallel on 16 processors and the datasets 
is split equally among the processors ( 24,766 point per processor). The compression is 
performed locally on each processor i.e. the matrix-vector product in Eq. 1 was per- 
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formed serially on each processor with no communications with other processors. Doing 
so preserves an efficient in-situ compression, and a faster and memory-efficient StOMP 
reconstruction on the visualization workstation. The whole dataset can be recovered by 
assembling the different reconstructed portions. Figure 6 shows the original and recon¬ 
structed versions of the steady state temperature fields for different portions of the dataset 
corresponding to different processors. For a compression ratio R = 10, the reconstructed 
and original datasets are in a visual agreement. 



Figure 6: Plots showing the steady state temperature distributions in different portions 
of the three-dimensional (top) original and (bottom) reconstructed dataset T. Results 
are reported from a parallel simulation on 16 processors where compression is performed 
locally with a ratio R = 10. The vertical panels correspond to almost equal portions of the 
full dataset as distributed among different processors, as indicated. The reconstruction is 
performed using StOMP for a wavelet order tc = 5 at full detail. 


Finally, we report in Figure 7 (left) the evolution of the reconstruction error as a 
function of time for a constant compression ratio R =.10 The error is initially large since 
at earlier times, large gradients exist in the dataset. As heat diffuses in the domain, the 
temperature field becomes smoother and better represented by the wavelet bases. It leads 
to a smaller reconstruction error. Overall, the error decreases with larger wavelet orders as 
expected. We notice a saturation in the error and even a slightly increased error for higher 
orders mainly at early times. Similar to the test cases in Section 4.1, this is due by the over¬ 
fitting of the high gradients in the temperature field by the higher order Alpert wavelets. 
These results suggest that local large gradients form discontinuities and contribute to the 
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overall reconstruction error. Therefore, when predicting the compression ratio in-situ, the 
local gradients have to be accounted for along the global gradient discussed in Section 4.1. 




Time 


Figure 7: Plots showing the variation of the reconstruction error in the dataset T as a 
function of time. Results are reported from a parallel simulation on 16 processors where 
compression is performed locally with a ratio R = 10. The reconstruction is performed 
using StOMP at full detail for different wavelet orders w, as indicated. 


5 Conclusion 

In this paper we have demonstrated an application of compressive sensing to unstructured 
mesh data. We used second generation wavelets to efficiently represent the irregularities 
present in the spatial geometries and meshes. We are able to achieve lossy compression 
ratios of between 10 and 30 on fields defined on these meshes, depending on the oscilla¬ 
tions and features present in the data. The visual deterioration as a result of the lossy 
compression at those rates is minimal. Large gradients and discontinuities in the data 
also contribute in assessing the reconstruction quality We explored continuous level of 
detail reconstruction for datasets exhibiting many features and found that it results in a 
lower reconstruction error at the expense of an increased computational cost. We con¬ 
tinue to explore ways to improve the algorithms used here in terms of reconstruction time 
and streaming. It may also be the case that other wavelet and sampling matrix pairs 
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and reconstruction algorithms will produce better results on some data. We continue to 
investigate these as well. 
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